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Abstract 

The  ability  to  answer  complex  questions  posed  in  Natu¬ 
ral  Language  depends  on  (1)  the  depth  of  the  available 
semantic  representations  and  (2)  the  inferential  mechar 
nisms  they  support.  In  this  paper  we  describe  a  QA  ar¬ 
chitecture  where  questions  are  analyzed  and  candidate 
answers  generated  by  1)  identifying  predicate  argument 
structures  and  semantic  frames  from  the  input  and  2) 
performing  structured  probabilistic  inference  using  the 
extracted  relations  in  the  context  of  a  domain  and  sce¬ 
nario  model.  A  novel  aspect  of  our  system  is  a  scal¬ 
able  and  expressive  representation  of  actions  and  events 
based  on  Coordinated  Probabilistic  Relational  Models 
(CPRM).  In  this  paper  we  report  on  the  ability  of  the 
implemented  system  to  perform  several  forms  of  prob¬ 
abilistic  and  temporal  inferences  to  extract  answers  to 
complex  questions.  The  results  indicate  enhanced  accu¬ 
racy  over  current  state-of-the-art  Q/A  systems. 

1  Introduction 

Current  Question  Answering  (QA)  systems  extract 
answers  from  large  text  collections  by  (1)  classify¬ 
ing  the  answer  type  they  expect;  (2)  using  question 
keywords  or  patterns  associated  with  questions  to 
identify  candidate  answer  passages;  and  (3)  ranking 
the  candidate  answers  to  decide  which  passage  con¬ 
tains  the  exact  answer.  Few  systems  also  justify  the 
answer  by  performing  abduction  in  first-order  pred¬ 
icate  logic  (Moldovan  et  ah,  2003).  This  paradigm 
is  limited  by  the  assumption  that  the  answer  can 
be  found  because  it  uses  the  question  words.  Al¬ 
though  this  may  happen  sometimes,  this  assump¬ 
tion  does  not  cover  the  common  case  where  an  in¬ 
formative  answer  is  missed  because  its  identification 
requires  more  sophisticated  processing  than  named 
entity  recognition  and  the  identification  of  an  answer 
type.  Therefore  we  argue  that  access  to  rich  seman¬ 
tic  structures  derived  from  domain  models  as  well  as 
from  questions  and  answers  enables  the  retrieval  of 
more  accurate  answers  as  well  as  inference  processes 
that  explain  the  validity  and  contextual  coverage  of 
answers. 

We  consider  several  stages  of  deeper  semantic  pro¬ 
cessing  for  answering  complex  questions.  A  first 
step  in  this  direction  is  the  incorporation  of  “se¬ 
mantic  parsers”  that  recognize  predicate-argument 
structures  or  semantic  frames  when  processing  both 
questions  and  documents.  A  second  step  is  the  iden¬ 
tification  of  a  topic  model  that  contributes  to  the 
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interpretation  of  the  question  and  generates  a  pos¬ 
sible  index  in  an  off-line  battery  of  ontologies.  The 
third  step  consists  of  building  a  scalable  and  expres¬ 
sive  model  of  actions  and  events  which  allows  the 
sophisticated  reasoning  imposed  by  QA  within  com¬ 
plex  scenarios.  We  embed  the  three  forms  of  seman¬ 
tic  representations  and  the  inference  they  enable  in 
a  novel,  flexible  QA  architecture  that  allows  us  to 
evaluate  the  impact  of  each  new  form  of  semantic 
information  on  the  accuracy  of  answering  complex 
questions. 

The  remainder  of  this  paper  is  organized  as  fol¬ 
lows.  In  Section  2  we  present  the  semantic  knowl¬ 
edge  that  we  extract  from  questions  and  answers 
as  well  as  our  novel  QA  architecture.  In  Section 
3  we  detail  our  model  of  event  structure.  Section  4 
presents  the  types  of  inference  that  are  associated 
with  the  event  structure  whereas  Section  5  details 
the  results  of  the  evaluations.  Section  6  summarizes 
the  conclusions. 

2  Semantic  Structures  for  QA 

Processing  complex  questions  involves  the  identifica¬ 
tion  of  several  forms  of  complex  semantic  structures. 
First  we  need  to  recognize  the  answer  type  that  is 
expected,  which  is  a  rich  semantic  structure,  in  the 
case  of  a  complex  question,  or  a  mere  concept  in 
the  case  of  a  factual  question.  Second,  we  need  to 
identify  the  question  class  or  the  question  pattern. 
Third,  in  the  case  of  a  complex  question,  which  is 
part  of  a  scenario,  we  need  to  model  the  topic  of  the 
scenario. 

At  least  three  forms  of  information  are  needed  for 
detecting  the  answer  type:  (1)  question  classes  and 
named  entity  classes;  (2)  syntactic  dependency  in¬ 
formation;  and  (3)  semantic  information  taking  the 
form  of  (i)  predicate-argument  structures  or  seman¬ 
tic  frames  and  (ii)  the  representation  of  the  question 
topic.  The  following  question  illustrated  the  signifi¬ 
cance  of  each  of  the  three  forms  of  information: 

Ql:  “What  stimulated  India’s  missile  program  ?” 
The  question  stem  “what”  is  ambiguous,  as  multiple 
answer  types  could  be  associated  with  a  question 
pattern  “What  stimulated  X?”.  To  find  candidate 
answers,  the  recognition  of  “India”  and  other  related 
named  entities,  e.g.  “Indian”,  as  well  as  the  name 
of  the  “Prithvi  missile”  or  its  related  program  is  im¬ 
portant.  To  better  process  question  Ql,  the  syntac- 
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Figure  1:  QA  architecture  based  on  several  forms  of  semantic  structures. 


tic  dependencies  enable  the  recognition  of  predicate- 
argument  structures.  The  predicate-argument  struc¬ 
ture  of  Q1  is: 


PREDICATE:  Stimulate 

ARGO  (role  =  agent)  ANSWER  (part  1) 

ARG1  (role  *  thing  increasing):  India’s  missile  progam 
ARG2  (role  *  instrument) :  ANSWER  (part  2) 

The  predicate-argument  structure  was  built  based 
on  the  definitions  of  the  PropBank  project  (Kings¬ 
bury  et  ah,  2002).  The  structure  indicates  that  the 
answer  may  have  the  role  of  agent  or  even  the  role 
of  instrument.  When  additional  information  from 
FrameNet  (Baker  et  ah,  1998)  is  used,  we  find  that 
the  answer  may  have  four  other  semantic  roles,  de¬ 
rived  as  frame  elements  of  two  distinct  frames: 


None  of  these  semantic  roles  are  fully  specified. 
To  interpret  the  semantic  information  constrained 
by  the  thematic  roles,  we  need  to  also  have  access  to 
a  topie  model  of  the  scenario  in  which  the  question 
is  being  asked.  For  example,  for  the  question:  Q2: 
“How  can  a  biological  weapons  program  he  detected 
the  topic  model  consists  of  (a)  a  set  of  typical 
relations  between  topic  concepts;  and  (b)  a  set  of 
possible  paths  of  actions.  As  it  is  illustrated  in  Fig¬ 
ure  1,  the  identification  of  (a)  predicate-argument 
structures  and  (b)  semantic  frames  contributes  to 
the  recognition  of  the  expected  answer  as  well  as  to 
the  formation  of  the  topic  model. 

Question  Q2  is  mapped  into  its  pattern  and  its 
focus,  which  has  the  role  of  the  topic  of  the  ques¬ 
tion.  The  document  passages  retrieved  for  the  spe¬ 
cific  topic  can  be  used  to  extract  the  most  relevant 
topic  relations  with  the  method  detailed  in  Section  2. 
The  event  structure,  detailed  in  Section  3  enables 
the  recognition  of  possible  paths  of  action  in  the 
format  of  chains  between  the  events  lexicalized  in 
the  topic  relations.  The  set  of  possible  paths  of  ac¬ 
tions  generate  different  interpretations  of  the  ques¬ 
tions  focus,  which  facilitate  the  mapping  of  the  orig¬ 


Figure  2:  Question  processing  based  on  topic  models. 

inal  predicate-argument  structure  in  other  predicate 
structures  in  which  the  semantic  type  of  the  answer 
has  less  ambiguity.  Figure  2  illustrates  the  mapping 
of  the  predicate  detect  in  the  predicates  produce  and 
acquire  that  can  be  extracted  in  parallel.  This  map¬ 
ping  enabled  by  the  topic  model  corresponds  to  the 
decomposition  of  the  original  complex  questions  into 
a  set  of  less  complex  questions. 

Because  the  model  for  event  structure  has  the  ca¬ 
pability  of  (1)  incorporating  domain  knowledge  in 
OWL-based  representations^;  and  (2)  performs  sev¬ 
eral  forms  on  inference  on  this  knowledge,  it  can  be 
used  to  extract  candidate  answers  from  the  passages 
retrieved  by  the  topic  relations.  The  QA  architec¬ 
ture  that  takes  advantage  of  these  semantic  struc¬ 
tures  and  the  inference  they  enable  is  illustrated 
in  Figure  1.  The  syntactic  parse  is  produced  by 
the  Collins  parser  (Collins,  1996),  the  Named  En¬ 
tity  Recognizer  (NER)  is  an  implementation  of  the 
NER  reported  in  (Bikel  et  ah,  1999)  whereas  the 

^OWL  is  a  markup  language  for  the  semantic  web 
(http://www.semanticweb.org)  which  allows  for  the  specifi¬ 
cation  of  ontologies  and  the  semantic  markup  of  documents 
in  an  xml  format  on  the  web 


predicate-argument  structures  and  the  frame  ele¬ 
ments  are  parsed  with  the  techniques  described  in 
Section  2.1.  All  these  four  operations  are  performed 
both  in  the  question  processing  module  and  in  the 
document  processing  module.  The  topic  model,  gen¬ 
erated  at  question  processing,  has  three  roles:  (1)  it 
provides  an  index  for  the  event  structures  to  find 
ontological  information;  (2)  it  refines  the  definition 
of  the  answer  type;  and  (3)  it  improves  the  quality 
of  the  retrieved  answer  passages  because  it  makes 
topic-relevant  relations  available.  The  derivation 
of  the  topic  model  is  based  on  the  predicate  argu¬ 
ment  structures  derived  from  the  question,  whereas 
the  answer  type  and  the  event  structures  rely  on 
the  frame  semantics  available  from  questions  and 
relevant  passages.  Because  PropBank  has  higher 
lexical  coverage  than  FrameNet,  whenever  the  se¬ 
mantic  frames  cannot  be  recognized,  the  QA  sys¬ 
tem  falls  back  on  the  predicate-argument  structure 
identified  in  questions  and  documents.  This  back-off 
mechanism  enables  (1)  indexing  and  retrieving  rel¬ 
evant  passages  from  document  collections  by  using 
lexico-semantic  knowledge;  and  (2)  the  recognition 
of  the  event  structure  referred  by  questions  and  an¬ 
swers.  The  Probabilistic  Inference  Networks  (PINs) 
described  in  Section  5.2  select  the  answer  structures 
and  identify  the  answers  to  be  returned. 

2.1  Predicate  and  Frame  Structures 

Proposition  Bank  or  PropBank  is  a  one  million 
word  corpus  annotated  with  predicate-argument 
structures,  which  were  described  in  (Kingsbury 
et  al.,  2002).  The  corpus  consists  of  the 
Penn  Treebank  2  Wall  Street  Journal  texts 
{www.cis.upenn.edu/~treebank).  For  every  given 
predicate  lexicalized  by  a  verb,  a  set  of  arguments  se¬ 
quentially  numbered  from  ArgO  to  Arg5  were  anno¬ 
tated.  The  general  procedure  was  to  select  for  each 
verb  the  roles  that  seem  to  occur  most  frequently 
and  use  these  roles  as  mnemonics  for  the  predi¬ 
cate  arguments.  Generally,  ArgO  would  stand  for 
agent,  Argl  for  direct  object  or  theme  whereas  Arg2 
represents  indirect  object,  benefactive  or  instr-ument, 
but  mnemonics  tend  to  be  verb  specific.  For  exam¬ 
ple,  the  argument  structure  for  the  verb-predicate 
steal  has  AigOiagent,  Kigl-.theme,  Aig2:source,  and 
AigSibeneficiary.  Additionally,  the  argument  may 
include  functional  tags  from  Treebank,  e.g.  ArgM- 
DIR  indicates  a  directional,  ArgM-LOC  indicates  a 
locative,  and  ArgM-TMP  stands  for  a  temporal. 

The  FrameNet  project  annotates  roles  defined  for 
each  semantic  frame.  A  frame  is  a  schematic  rep¬ 
resentation  of  situations  involving  various  partici¬ 
pants,  props  and  other  conceptual  roles,  all  called 
Frame  Elements  (FEs).  Eor  example  the  frame 
THEET  describes  situations  in  which  a  Perpetra¬ 
tor  takes  Goods  that  belong  to  the  Victim  .  The 
Means  by  which  this  is  accomplished  may  be  also 
expressed.  The  British  National  Corpus  is  used  for 
annotations. 

(Gildea  and  Jurafsky,  2002)  and  (Gildea  and 


Palmer,  2002)  report  on  the  same  statistical  method 
that  labels  argument  roles  from  PropBank  or  EEs 
from  ErameNet  on  any  English  sentence  that  is  syn¬ 
tactically  parsed.  Their  method  consists  of  two  clas¬ 
sification  tasks:  (1)  identifying  the  parse  tree  con¬ 
stituents  corresponding  to  the  predicate  arguments 
or  the  EEs;  and  (2)  recognizing  the  role  of  the 
argument  or  EE.  They  have  introduced  seven  fea¬ 
tures  that  (a)  were  used  for  training  both  classifiers; 
and  (b)  worked  both  for  PropBank  and  ErameNet. 
In  (Surdeanu  et  al.,  2003)  seven  additional  fea¬ 
tures  were  proposed,  that  enhanced  the  performance 
of  the  classifiers.  By  using  both  sets  of  features 
in  our  implementation  using  the  SVM-light  soft¬ 
ware  available  from  http://svmlight.joachims.org,  we 
automatically  transformed  the  Question  Q3  into 
the  predicate-argument  structure  PAS(Q3)  and  the 
Erame  Structure  FS(Q3): 


The  expected  answer,  as  predicted  by  PAS(Q3) 
is  the  Argl  of  the  predicate  ’steal’,  when  the  Arg2 
has  the  head  ’Russian  navy’.  Additionally,  the  an¬ 
swer  needs  to  be  in  the  same  semantic  class  as  ’nu¬ 
clear  materials  ’.  The  EEs  from  FS( Q3)  show  that  we 
should  search  for  an  EE  with  the  role  Goods  when¬ 
ever  we  find  a  target  word  of  the  frame  STEAL.  The 
paragraphs  containing  candidate  answers  are  parsed 
similarly.  Eor  example,  the  correct  answer  A(Q3)  is 
transformed  into  the  predicate-argument  structure 
PAS((A(Q3))  and  the  Erame  Structure  FS(A(Q3)): 


A(Q3):  Russia’s  Pacific  Fleet  has  also  fallen  prey  to  nuclear  theft;  In  1/96, 
approximately  7  kg  of  HEU  was  reportedly  stolen  from  a  naval 
base  in  Sovetskaya  Gavan . 

PAS(A(Q3)):  [Argl (PI)  Russia's  Pacific  Fleet]  has  [ArgM-DIS(PI)  also] 
[Predicate(PI):  fallen]  [Argl (PI):  prey  to  nuolear  theft]; 
[ArgM-TMP(P2):  in  1/96],  [Arg1(P2):  approximately  7  kg  of  HEU] 
was  [ArgM-ADV(P2)  reportedly] [Predicate(P2):  stolen] 
[Arg2(P2):  from  a  naval  base]  [Arg3(P2):  in  Sovetskaya  Gavan] 


FS(A(Q3)):  [VICTIM:  Russia's  Paoific  Fleet]  has  also  fallen  prey  to 
[GOODS:  nuclear]  [target-Predlcate(PI):  theft];  in  1/96, 

[G00DS(P2):  approximately  7  kg  of  HEU]  was  reportedly 
[target-Predicate(P2):  stolen]  [VICTIM(P2):  from  a  naval  base] 
[S0URCE(P2):  in  Sovetskaya  Gavan] 

In  PAS(A(Q3))  we  identify  two  predicates,  in¬ 
dexed  PI  and  P2.  P2  is  lexicalized  with  the  same 
word-lemma  as  the  predicate  from  Q3,  thus  its 
Argl(P2):  ’approximately  7  kg  of  HEU’ provides  the 
exact  answer.  It  is  to  be  noted  that  its  Arg2(P2) 
is  ’a  naval  base’  which  has  a  meronym  relation 
with  the  previously  mentioned  NP  ’Russia’s  Pacific 
Fleet’,  a  meronym  of  ’Russian  navy’.  The  same 
meronymy  needs  to  be  resolved  between  the  EE  Vic¬ 
tim  of  stolen’  and  the  EE  of  Victim  of  theft  in  the 
FS(A(Q3)).  In  the  second  case  the  meronymy  is 
identified  since  the  second  frame  identifies  an  event 
which  is  an  example  of  the  event  identified  by  the 
first  frame. 


Q3:  What  kind  of  nuclear  materials  were  stolen  from  the  Russian  navy  ? 

PAS(03):  What  [Argl:  kind  of  nuclear  materials]  were  [Predicate:  stolen] 
[Arg2:  from  the  Russian  navy]? 


FS(Q3):  What  [GOODS:  kind  of  nuclear  materials]  were 

[target-Predicate:  stolen]  [VICTIM:  from  the  Russian  navy]? 


2.2  Topic  Models 

In  question  processing  two  objects  need  to  be  identi¬ 
fied:  (1)  the  expected  answer  type  and  (2)  the  focus  of 
the  question.  For  example,  in  question  Q2:  How  can 
a  biological  weapons  program  he  detected  ?”,  the  ex¬ 
pected  answer  type  is  MANNER(of  detection)  and  the 
focus  is  ’biological  weapons  program  When  process¬ 
ing  complex  question  the  role  of  the  focus  becomes 
more  important,  since  it  guides  the  recognition  of 
the  topic  model  associated  with  the  question,  which 
in  turn  enables  the  identification  of  partial  answers 
and  the  relations  between  them.  To  identify  the  ex¬ 
pected  answer  type,  we  can  rely  on  the  question  stem 
(e.g.  “How”)  and  its  associated  semantic  classes  or 
we  can  determine  the  answer  type  by  using  a  combi¬ 
nation  of  features  associated  with  the  question  stem 
and  one  or  more  of  the  question  words.  For  exam¬ 
ple,  the  question  “How  long  does  it  take  to  produce 
weapons  of  mass  destruction  ?”  has  the  answer  type 
Time_Span  determined  by  the  combination  of  the 
stem  ’how  ’  and  the  adverb  ’long  ’.  This  information 
is  much  more  relevant  for  identifying  the  expected 
answer  type  than  the  fact  that  the  predicate  ’take’ 
has  ArgM=  ’how  long  ’  and  Arg2=  ’produce  weapons 
of  mass  destruction’,  which  represents  the  focus  of 
the  question. 

Complex  questions  rely  on  topic  models  for  finding 
the  answer  since  it  is  unlikely  that  in  a  text  collec¬ 
tion  the  exact  answer  to  a  complex  questions  can  be 
found,  but  it  is  more  likely  that  partial  answers  can 
be  detected,  and  then  they  may  be  combined  for 
generating  the  most  informative  answer.  We  used 
an  incremental  topic  representation  that  was  intro¬ 
duced  in  (Harabagiu,  2004).  Information  about  a 
topic  is  modeled  through  two  incremental  enhance¬ 
ments  of  the  topic  signatures  introduced  in  (Lin  and 
Hovy,  2000).  The  first  enhancement  determines  a 
set  of  seed  relations.  The  methodology  considers: 

(1)  filtering  out  outliers  of  the  terms  identified  as 
relevant  with  the  statistical  method  based  on  likeli¬ 
hood  ratio  reported  in  (Lin  and  Hovy,  2000) 

(2)  morphological  expansion  of  the  nouns  and  verbs 
from  the  topic  signature; 

(3)  semantic  normalization  through  the  NER  and  an 
off-line  ontology  of  22,000  words;  and 

(4)  selection  of  the  topic  seeds  with  the  same  like¬ 
lihood  ratio  method  applied  for  acquiring  the  topic 
concepts.  The  seeds  are  the  most  relevant  [Verb- 
Noun]  pairs  which  have  a  predicate-argument  rela¬ 
tionship. 

For  question  words  like  ’say’,  ’have’ or  ’identify’ 
were  filtered  out,  living  words  like  ’weapons’,  ’sarin’ 
and  ’produce’  as  the  most  relevant  topic  concepts. 
The  morphological  expansion  added  words  like  ’pro¬ 
duction’  whereas  the  semantic  normalization  unified 
’Russian’ and  TragF  into  NATIONALITY  and  ’bomb’ 
or  ’building  ’  into  Artifact. 

The  seed  relations  that  was  selected  for  ques¬ 
tion  Q3  is  [develop  -  program).  The  relation  is  fur¬ 
ther  used  to  produce  a  corpus  of  paragraphs  re¬ 


lated  to  the  corpus,  from  which  new  topic  relations 
can  be  extracted.  Two  types  of  relations  are  tar¬ 
geted:  (1)  syntax-based  relations  (e.g.  Verb  -  Sub¬ 
ject,  Verb  -  Object  and  Verb  -  Prepositional  Attach¬ 
ment)  and  (2)  salience-based  relations,  which  model 
long-dependency  relations  to  a  seed  concept.  The 
relations  are  ranked  based  on  a  methodology  intro¬ 
duced  in  (Riloff,  1996)  each  relation  is  ranked  based 
on  its  Relevance- Rate  and  its  Frequency.  The  Fre¬ 
quency  of  an  extracted  relation  counts  the  number  of 
times  the  relation  is  identified  in  the  relevant  para¬ 
graphs.  The  Relevance-Rate  =  Frequency  /  Count, 
where  Count  measures  the  number  of  times  an  ex¬ 
tracted  relation  is  recognized  in  any  paragraph  con¬ 
sidered. 

This  ranking  allows  us  to  select  a  new  topic  rela¬ 
tion,  and  to  resume  the  topic  modeling  procedure, 
this  time  on  a  new  corpus  generated  by  the  most 
recently  discovered  relation.  We  stop  the  discovery 
process  when  we  have  identified  20  topic  relations. 
Some  of  the  topic  relations  discovered  for  question 
Q2  are  illustrated  in  Figure  2. 

The  second  enhancement  of  topic  representations 
reported  in  (Harabagiu,  2004)  considers  the  notion 
of  topic  theme  that  associates  clusters  of  topic  rela¬ 
tion  with  text  segments.  The  segmentation  is  pro¬ 
duced  by  the  TextTiling  algorithm  (Hearst,  1997). 
The  nominalization  of  the  verb  corresponding  to  the 
most  relevant  topic  relation  in  a  segment  is  consid¬ 
ered  to  be  linked  to  the  nominalization  from  the  fol¬ 
lowing  topic-relevant  segment.  Such  segments  are 
called  themes  and  the  chains  of  nominalizations  rep¬ 
resent  possible  paths  of  actions.  Two  such  paths  are 
represented  in  Figure  2 

3  Prom  Semantic  Extraction  to 
Inference  for  QA 

Semantic  extraction  allows  us  to  identify  predica¬ 
tions  in  the  input  text.  For  processing  complex 
questions  we  further  identify  the  question  class  or 
the  question  pattern  as  well  as  relevant  parts  of  the 
scenario  which  we  refer  to  as  the  topic  model.  A 
significant  gap  remains  between  a)  the  unstructured 
and  intuitively  chosen  tag  sets  used  in  FrameNet  or 
PropBank  and  the  relation  names  and  clusters  in 
the  topic  model  and  b)  a  formal  characterization  of 
the  interrelated  events,  actions,  states  and  relations 
holding  among  them.  The  explicit  representation  of 
such  frame  semantic  and  event  structure  information 
is  needed  for  for  the  potential  use  of  such  resources 
for  question  answering. 

In  previous  work  (Chang  et  ai,  2002),  we  bridged 
the  gap  by  defining  a  formalism  that  unpacks  the 
shorthand  of  frames  into  structured  event  represen¬ 
tations.  This  allows  annotated  FrameNet  data  to 
parameterize  event  simulations  (Narayanan,  1999) 
that  produce  fine-grained,  context-sensitive  infer¬ 
ences.  We  have  extended  this  work  to  further  incor¬ 
porate  the  topic  model  and  theme  described  earlier. 
Currently,  the  list  of  extracted  predicate-argument 


structures,  the  topic  model  and  the  answer  type 
predicate  are  used  to  index  into  a  set  of  parame¬ 
terized  event  representations  instantiated  to  specific 
values  based  on  the  extracted  predicate-argument 
bindings  (see  Figure  3).  The  answer  type  predicate 
translates  to  a  specific  inference  procedure. 

Figure  3  (middle)  shows  the  representation  of  ex¬ 
tracted  predicate-argument  bindings  in  our  param¬ 
eterized  event  formalism,  Embodied  Construction 
Grammar  (ECG) (Bergen  and  Chang,  in  press),  that 
maps  annotations  to  event  simulations.  ECG  is  a 
constraint-based  formalism  similar  in  many  respects 
to  other  unification  based  linguistic  formalisms  such 
as  HPSG  or  LEG  (features,  roles,  constraints,  simple 
and  complex  slots,  subcasing,  and  a  se// reference) . 
ECG  differs  from  other  linguistically  motivated  pro¬ 
posals  in  1)  the  use  of  an  evokes  relation  that  mod¬ 
els  the  priming  of  a  background  schema  (role  inher¬ 
itance  is  lazy  and  explicitly  specified)  and  2)  the 
complex  network  of  conceptual  schemas  in  ECG  are 
designed  to  map  utterances  to  mental  simulations 
in  context  to  produce  a  rich  set  of  inferences.  It  is 
thus  ideally  suited  for  our  current  goal  of  translat¬ 
ing  frames  to  conceptual  representations.  Eigure  3 
(middle  left)  shows  the  theft  schema  instantiated 
to  the  bindings  extracted  from  the  answer  passage. 
Eigure  3  (middle  right)  shows  the  schema  instance 
enhanced  with  inferentially  derived  additional  bind¬ 
ings. 

A(Q3):  Russia's  Pacific  Fleet  has  also  fallen  prey  to  nuclear  theft;  in  1/96, 
approximately  7  kg  of  HEU  was  reportedly  stolen  from  a  naval 
base  in  Sovetskaya  Gavan . 

FS(A(Q3)):  [VICTIM:  Russia’s  Pacific  Fleet}  has  also  fallen  prey  to 
[GOODS:  nuclear]  [target-Predlcate(PI):  theft};  in  1/96, 

[G00DS(P2):  approximately  7  kg  of  HEU}  was  reportedly 
[target-Predicate(P2):  stolen}  [VICTIM(P2j:  from  a  naval  base} 

[S0URCE(P2):  in  Sovetskaya  Gavan} _ 


SCHEMA  INSTANCE:  FN.THEFT 

Subcase_of:  FN:Committlng_Crime 

Subcase_of:  FN:Take 

Evokes:  FN:Crime_Scenario  as  FNC 

Roles 

VICTIM:  "Russian  Navy,  Pacific  Fleet,Naval  Base' 
GOODS:  "approx.  7  KG  of  HEU" 

SOURCE:  "in  Sovetskaya  Gavan" 

SCHEMA  INSTANCE:  FN.  THEFT 

Subcase_of:  FN:Committlng_Crime 

Subcase_of:  FN:Take 

Evokes:  FN:Crime_Scenario  as  FNC 

Roles 

PERPETRATOR:  ?x:AGENT 

VICTIM:  "Russian  Navy,  Pacific  Fleet,Naval  Base" 
GOODS:  "approx.  7  KG  of  HEU" 

MEANS:  ?m 

OWN(?PERP£TRATOR,  "approx.  7KGHEU") 

own(VICTIM.  GOODS) 

o 

Crime  - \/ 

COMMITTING  CRIME^ 

v_y  \  / 

^  '^bVUKt 

CRIMRiSCEhhiRIC 

atiSC^RCE,  GOODS) 

Cy 

at(SOURCE,  PERS^ 

^^^FT(?MEANS)  ^ 

Cr 

^own(PERP,GOODS)^, 

FNiTHEFT  CPRM 

SELL(PERP.'G06dS)  \ 

Eigure  3:  Prom  Semantic  Extraction  to  Inference 

Eigure  3  (bottom)  shows  a  fragment  of  the  event 
simulation  for  the  THEFT  frame  (all  the  informa¬ 
tion  in  this  simulation  is  generated  from  informa¬ 
tion  in  the  ErameNet  database).  Preconditions  and 
world  states  that  obtain  before  the  event  include 
a)  VICTIM  owns  the  goods,  b)  the  perpetrator 
is  at  the  source  and  c)  the  goods  are  at  the 
SOURCE.  The  THEFT  event  can  be  a  simple  transi¬ 
tion  or  can  zoom-in  to  a  complex  event  with  phases 
(such  as  start,  ongoing,  finish,  interrupt,  eaneel,  re¬ 
sume,  stop) .  Complex  events  can  include  monitoring 


and  detection  conditions  as  well  as  resource  produc¬ 
tion,  consumption  and  locking.  The  eompletion  of 
THEFT  results  in  a)  the  perpetrator  owning  the 
GOODS  and  b)  the  evocation  of  the  crime  scenario 
schema,  which  gets  simulated  if  other  conditions  ob¬ 
tain  (such  as  AUTHORITIES  notice  the  crime).  The 
effect  of  one  action  may  probabilistically  enable,  dis¬ 
able,  interrupt,  or  terminate  other  possible  events 
(such  as  OWN  provides  evidence  for  the  future  sell 
event).  The  result  of  running  the  inference  process 
for  this  example  results  in  1)  identification  of  rele¬ 
vant  unbound  roles  (PERPETRATOR  and  MEANS)  and 
2)  highly  probable  new  assertions  and  bindings  (the 
perpetrator  owns  the  goods  after  the  theft).  1)  sug¬ 
gests  new  scenario-based  query  expansion  strategies 
and  is  a  result  of  updating  the  state  variables  after 
the  new  evidence  (extracted  predicate-arguments)  is 
asserted  as  this  process  is  called  filtering.  2)  is 
the  resultant  state  after  executing  the  action  and 
is  computed  by  a)  executing  the  action  and  identi¬ 
fying  reachable  states  and  b)  updating  the  state 
after  the  action  to  find  the  Maximum  A  Poste¬ 
riori  (MAP)  probabilities.  These  procedures  are 
amongst  the  important  inference  methods  for  struc¬ 
tured  stochastic  processes  and  are  directly  supported 
by  our  implementation. 

Technically,  the  event  structure  implementation 
uses  a  factorized  model  of  states  based  on  Tempo¬ 
rally  Extended  (aka  Dynamic)  Probabilistic  Rela¬ 
tional  Models  (Murphy,  2002;  Pfeffer,  2000;  Getoor 
et  ai,  2001)  that  enable  a  variety  of  inferences  that 
update  and  revise  the  state  variables  (forward  and 
backward  in  time).  Central  to  the  representation 
of  actions  and  events  is  an  event  model  called  ex¬ 
ecuting  schemas  (or  x-schemas),  motivated  by 
research  in  both  sensorimotor  control  and  cognitive 
semantics  (Narayanan,  1997).  X-schemas  are  ac¬ 
tive  structures  based  on  Stochastic  Petri  Nets  (Cia- 
rdo  et  ai,  1994)  that  cleanly  capture  sequentiality, 
concurrency  and  event-based  asynchronous  control^. 
Our  implementation  integrates  the  PRM  based  state 
model  with  the  x-schema  based  action  model  and  is 
called  Coordinated  Probabilistic  Relational  Models 
or  CPRM.  Our  CPRM  implementation,  KarmaSIM, 
is  linked  to  existing  linguistic  resources  (ErameNet 
and  WordNet)  and  to  ontologies  on  the  semantic 
web.  To  address  the  vexing  issue  of  domain  spe¬ 
cific  Knowledge  Acquisition  (KA),  in  past  work  we 
have  constructed  automatic  translators  from  OWL- 
based  event  and  process  ontologies  (such  as  OWL- 
S)  to  the  CPRM  modeling  framework,  KarmaSIM 
(Narayanan  and  Mcllraith,  2003).  WordNet,  Open- 
CYC,  and  SUMO  are  also  available  in  OWL.  Eor 
the  experiments  reported  here,  we  used  the  OWL- 

^X-schemas  have  been  shown  to  provide  a  cognitively 
motivated  basis  for  modeling  diverse  event-structure  re¬ 
lated  linguistic  phenomena,  including  aspectual  inference 
(Chang  et  al,  2002),  metaphoric  inference  (Narayanan, 
1997)  and  event-based  reasoning  in  narrative  understanding 
(Narayanan,  1999). 


based  Teknowledge  WMD  ontology®  to  instantiate 
the  general  frames  obtained  from  FrameNet^.  The 
CPRM  model  populated  with  domain  knowledge  to 
functions  as  a  QA  system  component  for  answer  ex¬ 
traction  (see  Figure  2). 

We  have  developed  a  protocol  that  allows  us  to 
take  predicates  and  frames  extracted  from  the  input 
text  and  perform  a  variety  of  causal  and  event  struc¬ 
ture  related  inferences  for  QA.  Currently,  the  main 
API  between  the  semantic  extraction  and  inference 
components  makes  use  of  1)  extracted  predicate- 
argument  structures,  2)  extracted  topic  models  and 
3)  a  set  of  extracted  answertype  predicates.  The 
topic  models  provide  an  index  into  the  CPRM  model 
database  (compiled  from  existing  FrameNet  and  Se¬ 
mantic  Web  (OWL-based)  databases).  CPRM  Mod¬ 
els  matching  the  topic  model  are  retrieved  and  in¬ 
stantiated  by  the  predicate  argument  bindings  spec¬ 
ified  by  the  semantic  parse  output.  The  answertype 
predicates  are  mapped  to  specific  structured  proba¬ 
bilistic  inference  procedures  afforded  by  the  CPRM 
models.  The  next  section  outlines  the  currently  im¬ 
plemented  CPRM  inference  algorithms  and  their  use 
for  question  and  answer  processing. 

4  Inference  With  CPRMs  for  QA 

Inference  in  structured  probabilistic  models  of  dy¬ 
namic  systems  (as  in  the  CPRM  model)  consists  of 
the  following  kinds  of  computations.  Here  Xt  is  a 
state  variable  at  time  t  (lowercase  xt  is  a  value  as¬ 
signment),  and  yt  is  an  observation  value  at  time  t. 
Filtering:Compute  P{Xt\yi,„t).  State  update  based 
on  the  observation  sequence. 

Prediction:  Compute  P{Xt+h\yi...t)-  Predict  the 
state  at  some  future  time  t  h  based  on  the  obser¬ 
vation  sequence  up  to  time  t. 

Smoothing:  Compute  P{Xt-m.\yi...t)-  Recompute 
previously  estimated  states  in  the  present  of  current 
evidence. 

MAP:  Compute  ar^maXj,^  t{P{xi,,,t\yi...t)-  Com¬ 
pute  the  best  assignment  of  state  values  given  the 
observation  sequence. 

Reachability:Given  a  CPRM  S  with  an  initial  state 
Xt  and  a  final  state  Xy,  is  Ay  G  TZ{S,Xt)‘! 

We  compiled  a  list  of  complex,  semantically  rich, 
high  frequency  answer  types  for  questions  in  the 
AQUAINT  CNS  data.®  The  top  four  categories  were 
to  1)  Support/Justification  for  a  proposition,  2)  the 
ability  of  an  agent  to  perform  a  specific  act,  3)  tem¬ 
poral  projection  or  predictions  from  a  state,  and  4) 
hypothetical  situations  (including  counterfactuals). 

®http:  / /www.reliant  .teknowledge.com/DAML /WMD  .owl 

^The  compilation  process  is  not  completely  automated, 
since  none  of  the  owl  ontologies  were  rich  enough  to  cover 
our  event  structure  model.  For  the  experiment,  we  restricted 
any  information  added  to  the  OWL-based  ontologies  to  the 
class  documentation  strings  provided  in  the  ontology.  We  are 
currently  trying  to  use  semantic  extraction  to  automatically 
generate  this  information  from  the  documentation. 

®AQUAINT  is  an  ARDA  sponsored  QA  program.  The 
Center  for  Non-Proliferation  (CNS)  data  is  a  data  source  re¬ 
leased  to  the  AQUAINT  project. 


In  our  model,  these  map  straightforwardly  into  the 
running  of  various  inference  procedures  (including 
their  sequential  application)  described  in  Section  3. 
For  counterfactuals,  we  use  the  idea  of  model  inter¬ 
vention  (proposed  by  (Pearl,  2000)).  The  exact  de¬ 
tails  of  the  algorithm  for  counterfactuals  is  outside 
the  scope  of  this  paper.  Table  4  summarizes  the 
various  query  types  and  the  corresponding  inference 
algorithms.  We  don’t  know  of  any  previously  im¬ 
plemented  QA  system  (going  from  text  to  inference) 
capable  of  handling  these  kinds  of  questions. 


Answer  Type 

Inference  Type 

Just  (Proposition) 
Ability  (Agt,  Act) 

Pr  edict  ion(State) 
Hypothetical(I,  State) 

MAP 

F;S 

P;R;MAP 

FiRi 

Table  1:  The  type  of  answer  required  and  the  inference 
algorithm  used  in  the  CPRM  model.  Here  MAP  stands 
for  Maximum  A  Posteriori  estimation,  F  for  filtering,  S 
for  smoothing,  R  for  reachability,  and  P  for  predictive 
inference.  ,  indicates  sequential  application.  The  symbol 
I  represents  a  specific  intervention  into  the  CPRM  net¬ 
work  (Pearl  2000)  as  specified  by  the  hypothetical  con¬ 
dition.  Computing  reachability  after  the  intervention  is 
given  by  Ri. 


5  Evaluating  Semantically  based  QA 

The  previous  sections  described  techniques  to  incor¬ 
porate  semantic  components  at  increasing  levels  of 
depth  and  complexity.  We  now  report  on  experi¬ 
ments  conducted  to  evaluate  the  utility  of  these  dif¬ 
fering  We  report  on  results  pertaining  to  the  impact 
of  (1)  the  identification  of  semantic  structures  and 
(2)  inference  through  CPRMs  on  a  baseline  state-of- 
the-art  Q/A  system  that  emerged  after  five  years  of 
TREC  evaluations. 

5.1  Evaluating  semantic  information 

To  evaluate  our  novel  QA  architecture  we  have  used 
a  set  of  400  questions  pertaining  to  four  different 
topics:  (Tl)  UN  inspections',  (T2)  Thefts  in  Russia’s 
nuclear  navy,  (T3)  Status  of  India’s  Prithvi  ballistic 
missile  project  and  (T4)  China’s  participation  in  non¬ 
proliferation  regimes.  Eor  each  topic  we  have  created 
a  gold  standard  consisting  of  (1)  100  questions;  (2) 
one  or  several  text  spans  considered  correct  answers 
by  two  independent  judges;  (3)  the  syntactic  parse 
produced  by  the  Collins  parser  (Collins,  1996)  which 
was  manually  corrected;  (4)  the  predicate  argument 
structures  of  the  questions  and  its  corresponding 
answer,  produced  automatically  and  then  corrected 
manually;  (5)  the  semantic  frames  whenever  they 
could  be  identified.  The  answers  were  extracted 
from  the  AQUAINT  CNS  corpus.  The  gold  standard 
was  used  for  evaluating  the  precision  (P(Arg))  and 
recall  (R(Arg))  of  identifying  the  correct  boundaries 
of  predicate  arguments.  We  have  also  computed  and 

Fi-score  as  Fi  ( Arg)  =  ^p^Argl+SArg^  •  ^ 


Corpus 

P(Arg) 

R(Arg) 

Fi(Arg) 

PropBank 

85.4 

85.6 

85.5 

AnswerBank 

89.4 

89.5 

89.4 

Corpus 

P(Role) 

R(Role) 

Fi(Role) 

PropBank 

88.5 

92.7 

90.5 

AnswerBank 

86.8 

95 

90.7 

Table  2:  Identification  of  predicate-argument  struc¬ 
tures. 


Corpus 

P(FE) 

R(FE) 

Fi(FE) 

FrameNet 

75.2 

77 

76.08 

AnswerBank 

73.5 

74 

73.74 

Corpus 

P(Role) 

R(Role) 

Fi(Role) 

FrameNet 

91.57 

89.13 

90.33 

AnswerBank 

90.2 

88.5 

89.34 

Table  3:  Identification  of  frame  structures. 


the  results.  The  Table  also  lists  the  precision  of  clas¬ 
sifying  the  arguments  (P(Role)),  the  recall  for  argu¬ 
ment  classification  (R(Role))  and  the  corresponding 
Fi-score.  The  results  are  presented  for  two  corpora: 
the  PropBank  section  23;  and  AnswerBank,  which 
represents  our  gold  standard.  Table  3  presents  sim¬ 
ilar  results  for  recognizing  the  boundaries  of  frame 
elements  (FEs)  from  FrameNet  and  for  classifying 
their  semantic  roles. 

5.2  Evaluating  the  CPRM  model  for  QA 

We  experimented  with  the  QA  system  on  the 
AQUAINT  CNS  data.  Since  there  are  no  imple¬ 
mented  QA  systems  that  perform  the  kinds  of  com¬ 
plex  inferences  described  above,  our  evaluation  with 
respect  to  the  current  state-of-the-art  baseline  re¬ 
lates  to  the  enhanced  set  of  questions  and  answer 
types  our  system  can  handle.  We  wanted  to  calibrate 
to  extent  and  type  of  inferences  needed  for  different 
questions  in  the  CNS  scenario  data  as  well  as  the  ex¬ 
tent  to  which  such  inferences  require  manual  domain 
model  building.  To  this  end,  we  created  a  set  of  400 
hand-annotated  question  answer  passages  for  the 
gold  standard.  We  measured  the  performance  of  our 
system  with  along  the  following  dimensions.  1)  How 
well  did  the  automatically  constructed  CPRM  do¬ 
main  models  (from  the  OWL  ontologies)  fare  when 
compared  to  the  manually  constructed  (from  gold- 
standard  CNS  data)  CPRM  model?  2)  How  capa¬ 
ble  was  our  CPRM  event  model  in  performing  a  set 
of  complex  event-structure  based  inferences  required 
for  QA? 

To  test  (1),  we  manually  compiled  CPRM  domain 
models  based  on  our  core  theory  of  events  and  on 
the  gold  standard  annotations  (we  used  a  60-20- 
20  build- validate-test  dataset).  We  compared  this 
to  the  semi-automatically  generated  from  the  OWL 
databases  of  WMD  processes.  For  our  first  exper¬ 
iment,  we  looked  at  how  many  of  the  complex,  se¬ 


mantically  rich  inference  types  could  be  made  by 
our  system  for  the  two  models.  Figure  4  shows  the 


Percent  correct  by  inference  type 


Figure  4:  Performance  of  the  CNS-based  (gold  stan¬ 
dard)  and  OWL-derived  CPRM  models  based  on  infer¬ 
ence  type 

performance  of  the  two  systems  on  the  CNS  gold- 
standard  annotations  (the  results  are  for  the  test 
data  of  80  questions).  Note  that  both  the  manually 
built  and  the  OWL-based  models  perform  reason¬ 
ably  well  for  the  different  inference  types  we  looked 
at.  This  is  somewhat  encouraging  given  that  this  is 
the  first  inference  based  QA  system  (that  we  are 
aware  of)  that  goes  from  textual  input  to  infer¬ 
ence.  The  main  shortcoming  of  the  OWL-derived 
models  was  that  they  lacked  detailed  specifications 
of  the  processes,  their  resource  requirements,  and 
a  detailed  list  of  agent  abilities,  preconditions,  ef¬ 
fects  and  maintenance  conditions.  We  are  seeking  to 
overcome  this  deficiency  through  a  variety  of  auto¬ 
matic  techniques,  semantic  web  resources,  and  Sub¬ 
ject  Matter  Expert  (SME)  input  using  the  CPRM 
GUI  to  bootstrap  and  enhance  the  acquisition  of 
domain  specific  knowledge.  However,  results  from 
these  efforts  remains  future  work. 

To  test  (2),  we  looked  at  the  percentage  of  in¬ 
ferences  by  different  types  of  event-structure  infer¬ 
ences  that  had  to  be  made  to  generate  the  answer 
for  the  questions  in  the  400  gold  standard  anno¬ 
tations.  The  categories  we  looked  at  were  aspec¬ 
tual  inferences  (Phases  of  events,  viewpoints  (zoom- 
in,  zoom-out)),  action  and  process-feature  infer¬ 
ences  (Preconditions,  Effect,  Resources  (produced, 
consumed,  locked)),  metaphoric  inferences  (we  only 
looked  at  Event  Structure  Metaphors  (Lakoff  1999). 
We  counted  the  number  of  inferences  made  by  the 
human  and  by  the  model  (the  CNS-based  manually 
built  model)  for  each  category  in  the  annotated  data. 
We  looked  at  the  precision  (number  of  correct  infer¬ 
ences)  and  recall  (number  of  total  made)  .® 

Table  4  shows  our  initial  results.  Note  that  all 


®We  computed  an  f-score  based  on  (p^^)  for  both  the 
CNS  gold-standard  based  CPRM  model  and  for  the  OWL 
derived  model. 


Component 

Number 

Miy 

M2/ 

Aspectual 

375 

.74 

.65 

Action-feature 

459 

.62 

.45 

Metaphor 

149 

.70 

.62 

Table  4:  Inferences  broken  by  Event  Structure  compo¬ 
nent.  Mif  refers  to  the  f-score  of  the  manually  con¬ 
structed  CNS  gold-standard  model,  M2/  to  the  model 
derived  from  OWL. 


AH 

PAS 

FS 

TM 

49  (12%) 

130  (32%) 

78  (19%) 

42(10%) 

PAS-I-TM 

FS+TM 

ES+TM 

ES-l-Inf 

141(35%) 

94(23.5%) 

203(50%) 

294(73.5%) 

Table  5:  Number  of  correct  answer  types  identified  by 
semantic  information  originating  in:  the  Answer  Hierar¬ 
chy  (AH),  the  predicate-argument  structure  (PAS);  the 
topic  model  (TM);  the  event  structure  (ES)  and  the 
CPRM  inference  (Inf)  for  a  set  of  400  complex  questions. 


three  of  the  categories  of  inferences  are  fairly  com¬ 
mon  in  the  data,  and  our  initial  results  are  quite  en¬ 
couraging.  The  more  domain  general  inference  types 
regarding  the  aspectual  and  metaphoric  inferences 
about  events  seem  to  fair  reasonably  well  (recall  that 
all  these  inferences  are  impossible  in  the  state-of- 
the-art  baseline  QA  system).  The  lower  score  the 
action-feature  inference  seems  to  tied  to  the  lack  of 
domain  knowledge  in  our  model  regarding  domain 
specific  process  details  (such  as  the  specific  resources 
for  the  production  (or  dispersal)  of  WMD).  We  ex¬ 
pect  this  number  to  increase  considerably  with  more 
domain  specific  knowledge  using  the  techniques  de¬ 
scribed  earlier.  We  are  also  conducting  a  detailed 
study  of  other  important  categories  of  event  related 
causal  inferences. 

5.3  Evaluating  the  Answers 

The  focus  of  our  experiments  was  to  measure  the  im¬ 
pact  of  (1)  the  identification  of  semantic  structures 
and  (2)  inference  through  CPRMs  on  state-of-the- 
art  Q/A  techniques  that  emerged  after  five  years  of 
TREC  evaluations.  As  reported  in  (Moldovan  et  al., 
2002),  most  of  the  errors  of  Q/A  systems  are  de¬ 
termined  by  (a)  the  incorrect  identification  of  the 
expected  answer  type  and  (b)  the  inability  to  ex¬ 
pand  question  keywords  with  the  ideal  words  that 
enhance  the  retrieval  of  the  candidate  answers. 

Table  5  lists  the  results  obtained  for  the  identifica¬ 
tion  of  correct  answer  types.  The  answer  hierarchy 
(AH)  comprising  more  than  8000  WordNet  concepts 
and  mapping  into  15  name  classes  was  the  source  of 
only  12%  of  the  correctly  recognized  answer  types, 
in  contrast  with  the  more  than  70%  that  is  cor¬ 
rectly  identified  for  factoid  questions  when  process¬ 
ing  TREC-like  data.  To  evaluate  the  contribution 
of  predicate-argument  structures  (PAS),  we  consid¬ 
ered  that  the  answer  type  can  be  defined  not  only 
as  a  semantic  class,  but  also  as  an  argument  of  a 
specific  predicate.  Whenever  the  answer  would  be 


recognized  as  the  same  argument  of  the  same  predi¬ 
cate  or  of  a  directly  related  predicate^  we  considered 
that  the  answer  type  is  recognized  correctly.  Simi¬ 
larly,  when  the  frame  structures  could  be  identified 
in  the  question  and  the  answer,  the  answer  type  can 
be  indicated  by  the  frame  element  (FE) ,  and  its  cor¬ 
rect  identification  accounts  for  our  resolution  of  a 
correctly  predicted  answer  type.  The  topic  models 
(TMs)  contribute  to  the  recognition  of  the  answer 
type  if  any  of  the  relations  they  induce  pertains  to 
the  expected  answer,  which  may  be  either  the  re¬ 
lation  itself,  a  more  complicated  structure  that  in¬ 
cludes  any  of  the  topic  relations  or  any  concept  that 
takes  part  in  any  topic  relation  but  was  not  acces¬ 
sible  directly  from  the  question  words.  The  event 
structure  (ES)  was  considered  a  valid  source  for  find¬ 
ing  the  answer  type  if  any  of  the  schemas  that  were 
instantiated  contained  at  least  a  semantic  class  or  re¬ 
lation  that  corresponds  even  partially  to  the  answer 
structure,  whereas  the  combination  between  ES  and 
the  inference  procedures  (Inf)  determines  the  answer 
type  either  by  considering  only  the  semantic  infor¬ 
mation  available  from  the  ES  or  by  adding  to  it  the 
answer  types  determined  by  inference.  The  results 
listed  in  Table  5  show  that  the  schema  instantia¬ 
tions,  through  their  very  general  semantic  coverage 
account  for  most  of  the  answer  types  which  are  rec¬ 
ognized,  whereas  the  addition  of  answer  types  deter¬ 
mined  by  inference  accounts  for  almost  73.5%  of  the 
correct  answer  types  of  the  evaluated  complex  ques¬ 
tions.  When  processing  the  test  questions  only  with 
the  AH,  8%  of  the  answers  were  correct.  In  con¬ 
trast,  when  all  the  other  semantic  structures  were 
available  and  probabilistic  inference  could  be  per¬ 
formed,  52%  of  the  extracted  answers  were  correct. 
In  future  work  we  plan  to  investigate  ways  in  which 
the  semantic  structures  presented  in  this  paper  could 
improve  the  quality  of  paragraph  retrieval  and  key¬ 
word  selection. 

6  Issues  and  Discussion 

The  last  few  years  have  witnessed  a  good  deal  of  ac¬ 
tivity  on  predicate  extraction  (aka  semantic  parsing 
(Gildea  and  Jurafsky,  2002;  Kingsbury  et  al.,  2002)). 
Until  now  it  has  been  unclear  if  and  how  predicate 
extraction  might  help  in  the  performance  of  an  ac¬ 
tual  NLP  task.  Often  the  intuitive  justification  of¬ 
fered  was  that  predicate  extraction  was  an  interme¬ 
diate  step  toward  semantic  inference  (Gildea  and  Ju¬ 
rafsky,  2002) .  As  far  as  we  know  the  results  reported 
in  this  paper  constitute  the  first  demonstration  that 
sophisticated  textual  analysis  including  predicate- 
argument  extraction  can  be  combined  with  deep  se¬ 
mantic  representation  and  inference  models  to  en¬ 
hance  a  state-of-the-art  QA  system  to  answer  new 
question  types  that  pertain  to  causal  and  tempo- 

^Directly  related  predicates  are  those  that  (a)  belong  to 
the  same  verb  hierarchy  in  WordNet  or  (b)  are  arguments 
of  the  target  predicate  (either  because  they  are  infinitives  or 
because  they  belong  to  a  relative  clause). 


ral  aspects  of  complex  events.  Importantly,  we  be¬ 
lieve  our  work  demonstrates  a  flexible  architecture 
and  methodology  that  harnesses  the  increasingly 
widespread  availability  of  semantically  motivated  re¬ 
sources  (such  as  WordNet,  FrameNet,  and  the  Se¬ 
mantic  Web).  Our  current  efforts  are  directed  at 
more  effective  knowledge  acquisition  and  at  expand¬ 
ing  the  coverage  of  system  both  in  terms  of  the  do¬ 
main  models  and  question  and  answer  types  sup¬ 
ported.  We  believe  that  our  flexible  architecture 
and  CPRM  based  computational  model  for  combin¬ 
ing  predicate  and  frame  parsing  with  deep  inference 
could  point  the  way  for  building  the  next  generation 
of  semantically  rich  QA  systems. 
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